13. Exercise: Duplicate Rows

Exercise: Duplicate Rows

In this exercise, you'll open a data file and remove the duplicate rows.

Your data file is in comma-separated values format, or CSV format. The file extension is .csv and the file can be read as plain text, unlike .xlsx and .xls formats. In a CSV file, the first row may be column headers separated by commas, while all later rows are the data rows. Each value that corresponds to a column is separated by a comma.

Spreadsheet applications are designed to easily open this type of file, and it is often used for storing tabular data. For example, the following text can be seen in the exercise CSV file named worldcities.csv by looking at it with a plain text editor such as Microsoft Notepad on Windows or TextEdit on Apple Mac :

City,Country
Rovaniemi,Finland
Steinkjer,Norway
Monterey,United States of America
Kuta,Indonesia
Lovec,Bulgaria
Moosonee,Canada
Gulkana,United States of America
Starorybnoye,Russia
Amol,Iran
Karema,Tanzania

If you open the same file with a spreadsheet application such as Excel, the application will automatically separate the columns for you:

Task Description:

The following list has a series of steps for this exercise. As you complete each step, check it off the list. The quizzes in the task list can be found below.

Task List:

Task Feedback:

Congratulations!

How many duplicates?

QUESTION:

How many duplicates did you remove?

SOLUTION:

NOTE: The solutions are expressed in RegEx pattern. Udacity uses these patterns to check the given answer